A dual assortative measure of community structure 
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Current community detection algorithms operate by optimizing a statistic called modularity, which analyzes 
the distribution of positively weighted edges in a network. Modularity does not account for negatively weighted 
edges. This paper introduces a dual assortative modularity measure (DAMM) that incorporates both positively 
and negatively weighted edges. We describe the the DAMM statistic and illustrate its utility in a community 
detection algorithm. We evaluate the efficacy of the algorithm on both computer generated and real-world 
networks, showing that DAMM broadens the domain of networks that can be analyzed by community detection 
algorithms. 



PACS numbers: 



).75.Fb, 89.75.Hc 



The problem of detecting community structure within com- 
plex networks has received considerable attention in recent 
literature fl] H Jl lH lH & Given a network of nodes 
and edges, the challenge is to group nodes into communities 
according to the distribution of edges. There exist many pos- 
sible ways to define community mathematically. One widely 
accepted definition, known as modularity defines a com- 
munity to be a group of nodes that are more densely connected 
than would be expected if the edges had been assigned at ran- 
dom. This definition assumes that each edge has a positive 
weight. 

A common example of such a network is a friendship net- 
work. Nodes of the friendship network represent people, and 
the edges, which are positively weighted, represent friend- 
ships. Intuitively, communities are comprised of sub-graphs 
in the network that are densely connected to one another but 
sparsely connected to the outside. The term assortativity 0] 
QS|] refers to the tendency for nodes to be connected to others 
that are like, or unlike, them. In the case of the friendship net- 
work, communities are based on positive assortativity because 
nodes are connected to others with whom they share a positive 
connection (friendship). Panel A of figure Q] depicts a friend- 
ship network, where the solid edges are positively weighted 
and represent friendships. 

In this paper, we incorporate the concept of negative as- 
sortativity, or disassortativity, into the definition of commu- 
nity. Nodes are negatively assortative if their connection is 
based on dissimilarity, rather than likeness. With regards to 
the friendship network, negatively weighted edges represent 
the strength of adversarial relationships. We refer to a net- 
work that contains only negatively weighted edges as an ad- 
versarial network. As previously described, all of the edges in 
the network shown in panel A of figure[T]are based on friend- 
ships. However, let us assume that this friendship informa- 
tion is unavailable and that instead a list of adversarial rela- 
tionships between nodes is provided. Further, assume that all 
pairings in the original network that did not share friendships 
are now considered adversaries. The resulting adversarial net- 
work is presented in panel B of figure Q] The dashed edges 
indicate negative weights. The two networks, the friendship 
network (top left) and the adversarial network (top right), pro- 
vide similar but different information. It is not the case that the 
adversarial network is always the reciprocal of the friendship 



network. 

In this paper, we combine the two concepts - both positive 
and negative assortativity - to form a single definition of com- 
munity. Because this definition incorporates the contributions 
of both negative and positive relationships, we refer to it as 
dual assortative. The networks on the bottom of figure Q] il- 
lustrate this duality. They contain both positive relationships 
(solid edges) and negative relationships (dashed edges). Such 
networks may be fully connected, as is the case of the net- 
work in panel C of figure Q] However, more commonly, only 
a fraction of the possible relationships between nodes may be 
known (panel D of figure [TJ. An example of a dual assorta- 
tive network is one in which the edge weights are based on 
a similarity measure, such as correlation, which can assume 
either a positive or negative value. Consider a network where 
the nodes represent financial traders and the edge weights in- 
dicate the correlation of trading behavior between a pair of 
traders. The dual assortative modularity measure (DAMM) 
definition incorporates all available information, positive and 
negative, to assess the strength of community structure. 

Intuitively, there exists an asymmetry in the information 
provided by positive and negative edges. A friendship be- 
tween two people conveys a stronger bond than sharing a com- 
mon adversary. However, this is only true when there are three 
or more communities. When only two real communities exist, 
a negative edge provides the same amount of information as a 
positive edge. Consider the case of having two communities, 
community C a and community Cb, if two nodes share a neg- 
ative edge it indicates that one node should belong to C a and 
the other to Cb- However, in the case of three or more com- 
munities, the negative edge simply indicates that the nodes 
should reside in separate communities but does not indicate 
which particular communities the nodes should belong. The 
information provided by a positive edge is more specific than 
that for a negative edge. 

The remainder of the paper is organized as follows. In sec- 
tion H] the mathematical framework for the dual assortative 
measure is introduced and explained. Further, a community 
detection algorithm used for optimizing the DAMM is de- 
scribed. In section |nj we assess the efficacy of optimizing 
the DAMM on both computer generated and real networks, 
and section imisumm arizes and concludes the paper. 
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(A) positive assortativity (B) negative assortativity 




(C) dual assortativity (D) dual assortativity 

(full representation) (partial representation) 

FIG. 1: Comparison of networks with different types of assortativity. 
Solid edges denote positive edge weights; dashed edges denote nega- 
tive edge weights. The network in panel A portrays positive assorta- 
tivity (PA) and provides an example of a friendship network. In panel 
B, the network strictly contains negative edges and depicts negative 
assortativity (NA). We refer to this type of network as an adversar- 
ial network. The network in panel C exemplifies a fully connected 
dual assortativity (DA) network. Here, both positive and negative 
edges are present. In panel D, a partially connected dual assorta- 
tive network is illustrated. The dual assortative modularity measure 
(DAMM) can be used to assess community structure in all of the 
above cases. The partially connected DA network of the bottom right 
is the most general, and it is these types of networks that we study in 
sectionlnl 

I. METHODS 

This section introduces the mathematics of the dual assor- 
tative modularity measure (DAMM), describes the extremal 
optimization algorithm for optimizing the DAMM on a given 
network, and describes a measure, called communal overlap, 
which we use to quantify the fidelity of a community detection 
result given that the real communities are known and available 
for comparison. 

A. The dual assortative modularity measure 

Before describing the dual assortative modularity measure 
(DAMM), we review the original modularity measure, which 
provides the foundation for the DAMM. We then show how to 
quantify the negative assortative contributions of a network. 
The positive and negative components are then combined to 
establish the DAMM. Finally, we introduce the algorithm used 
to optimize the DAMM on a network. 



1. Positive assortativity 

We denote an edge weight between node r and node s as e rs . 
For simplicity of explanation, we consider only networks with 
edges of weight e rs e {-1, 0, 1}, with e rs = implying that the 
edge is not present. However, in practice e rs will often be a 



real number, e rs e 9v. Given a set of communities, denoted C, 
we use {wij = e rA r e C,> s e Cj} to denote the cumulative 
edge weight between community i and community j. 

Equation [1] gives the original modularity measure (U], de- 
noted as Q + . The implied edge weight domain is e rs e {0, 1}, 
which is analogous to an unweighted network. The w„ term 
represents twice the number of edges in which both ends ter- 
minate at nodes belonging to community i. Further a, = 
Yjj Wij gives the sum of all edge weights with at least one end 
attached to a node residing in community i, and T = 2; 2; w ij 
is twice the total number of edges in the network. We use the 
term spoke to refer to the terminal end of an edge. With refer- 
ence to a,, we count the number of spokes connected to com- 
munity C, . Similarly, T refers to the total number of spokes in 
the network. 

i'=0 

The DAMM is given in equation [5] It uses a modified 
form of equation Q] to compute the contribution of positively 
weighted edges. Specifically, we redefine a,- = 2y H(wij)Wij, 
where H(x) is the unit step function. This modification en- 
sures that a, incorporates only the contributions of positively 
weighted edges. We also redefine T as T = |Wyl> so 

that both positively and negatively weighted edges contribute 
to the total weight T. 

The summation in equation Q] iterates through the set of 
communities. For each community, the difference (wu/T) - 
(di/T) 2 reflects the strength of that particular community. The 
first term, (wu/T), represents the ratio of intra-communal 
edges to the total number of edges in the network. An edge is 
considered to be intra-communal if both ends are connected to 
nodes residing in the same community. One could mistakenly 
assume that the higher this ratio, the greater the strength of 
the community. However, if it were, the ratio would be opti- 
mized by a single community containing all nodes within the 
network. Thus, we compare the ratio found in the first term to 
the expectation of its value, (a,/r) 2 . The ratio a -JT represents 
the ratio of edge spokes connected to the given community to 
the total number of edge spokes, and thus its square gives the 
expectation. If the difference is positive, the observed ratio is 
greater than what would be expected if the edges were placed 
randomly. The greater the (positive) difference, the greater 
the communal strength. If the difference is negative, the com- 
munal strength is found to be weaker than the expectation, 
suggesting no communal structure for the community being 
investigated. 



2. Negative assortativity 

This measure is motivated by the idea that a shared adver- 
sary represents a commonality. In other words, if both Jack 
and Jill are both adversaries with Alice, they share a com- 
monality regardless of whether the pair are friends. In the 
case of positive weights, we quantified how much the ratio 
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of edges encapsulated by a community differed from the ex- 
pected value under random edge assignments. Here, the sce- 
nario is reversed. Adversarial relationships within a commu- 
nity are not desirable. In a scenario of perfect community 
structure, all negative edges would occur between communi- 
ties. 

Equation [2] defines the negative assortative component of 
the DAMM. The equation resembles that of equation [T] ex- 
cept the order of terms is reversed and we consider negatively 
weighted edges. Here, a, represents the cumulative negative 
edge weight connected to nodes of community i. In other 
words, a,- = - H(wij)]wij, where H(x) is the unit step 
function. vv„ represents the cumulative negative edge weight 
encapsulated by community i. The first term of equation [2] 
provides a null test. 

r-Z -? 



have been assigned, the DAMM is computed. Thereafter, a 
single node is migrated from one partition to the other, and 
the DAMM is recomputed by adding AQ D associated with the 
migrated node (section UB II ). 

A counter, denoted K, tracks the number of moves since 
the last DAMM improvement. If the DAMM fails to improve, 
the counter is incremented. Otherwise, the counter is reset to 
zero, and the partitioning is recorded along with its associ- 
ated DAMM value. This partitioning represents the best de- 
tected configuration. The process continues until the counter 
reaches a predetermined threshold. For each division, the size 
of the community, denoted |C,|, determines the stopping cri- 
terion such that the maximum allowable number of steps is 
S = a\Ci\ (a = 3 in the experiments of section |H]). Once 
the counter reaches the threshold S , such that K — S , the pro- 
cess terminates and the best detected configuration is retained. 
If the split has improved the DAMM value, the global set of 
communities is updated to reflect the division. Otherwise, it 
remains unchanged and is marked indivisible. 



3. Dual assortative modularity measure (DAMM) 



1. Calculating AQ D 



To establish the DAMM, we combine the negative assorta- 
tivity contribution of equation[2]with the positive assortativity 
contribution of equation Q] We define the DAMM as follows: 



Q D = Q + + Q~ 

|C|-1 r in |C[— 1 r _ t 
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(3) 
(4) 

(5) 



In the absence of negative edges, w n - — and a,- = for all 
i, and the DAMM reduces to equationQ] 



B. Optimizing DAMM with the extremal optimization 
algorithm 

We use extremal optimization (EO) |Ht] ifToll to optimize the 
DAMM statistic and detect communities in a network. EO 
is known to be effective for community detection using the 
original modularity measure that is based solely on positive 
assortativity lH.EO is a divisive approach, in which all nodes 
are initially placed in a single community. Thereafter, each 
community is divided recursively into two independent com- 
munities, not necessarily of the same size. At each step, the 
division found to provide the largest increase in modularity is 
applied, given that the increase is positive. If the best divi- 
sion does not increase modularity, the community is declared 
indivisible. When all existing communities are found to be 
indivisible, the algorithm halts. 

Each division proceeds as follows. Initially, the nodes are 
randomly assigned to one of two partitions. After all nodes 



An important component of the EO algorithm involves 
choosing which nodes to migrate. Rather than choosing nodes 
at random, we associate a value AQ D,U with each particular 
node u. This approach differs slightly from that of [5], which 
uses a heuristic to AQ rather than the exact difference. The 
value AQ D,U represents the change in DAMM that occurs by 
migrating the specified node. This method resembles hill- 
climbing used in other settings and biases the search for an 
optimal division towards immediate improvements. 

In practice, we maintain a list that associates a AQ D - 1 ' value 
with each node u. To select a node for migration, we rank the 
list of AQ D u values and then probabilistically choose a node 
using a method known as t-EO [HI] H. Using this process, a 
node of rank q is chosen with probability of P(q) « q~ T where 



T = 1 + 



1 



logic- 



Following the migration of a node u, AQ •" is 
updated as described in section lTB 21 

The calculation of AQ D '" is given by equation [7] The 
derivation is provided in appendixlAl 



AQ?' U = aqP" + aq;' u 



2 

+2 



Q[ — Qp 



(W gu -W lu \ d u , 

wiu - w gu \ d u / 

' f2 \ a S ~ a l + 



■ d u )) 



(6) 



(7) 



2. Calculating A 2 Q D 

After each migration, all AQ D,U values are subject to 
change. Rather than recompute the value for each node, a less 
computationally expensive approach makes use of A 2 Q D, " m . 
Each value can be updated as AQ°;" = AQ°>" + A 2 Q°>""\ 
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where A 2 Q®'"'" represents represents the change in AQ D " for 
node u following the migration of node m at time t . 

As noted, computation of A 2 Q®' 1 "" involves two nodes: the 
node m that was migrated and the node u for which AQ D,U 
must be updated. Both nodes might move to the same com- 
munity, or they might move in opposite directions (each com- 
munity gains one node and loses the other node). We use a di- 
rection indicator D e {-1, 1} to indicate how the nodes move. 
If they both move to the same community, D = 1 ; otherwise, 
D = -l. 

The calculation of A 2 Q®' um is given by equation [8] The 
derivation is provided in appendixlBl 



A 2 



AD 



(d u d m - d u d m ) (Wf s + Wf s ) 

T 2 + T 



(8) 



The computational cost of computing A 2 g D '""' is less than 
that for AQ D - U . The latter requires computing w gu , w\ u , W gu , 
and Wiu, which is <9(|C,|), where |C,| represents the number 
of nodes in community i. The cost of computing A 2 Q D,um is 
reduced to 0(1). 



C. Measuring communal overlap 

In section HU we will optimize the DAMM on a given net- 
work, recover the detected communities, C d , and assess the 
similarity of C d to the known communities C k . For this final 
step, which compares two communities, we introduce a mea- 
sure that we call communal overlap. The statistic quantifies 
the similarity between two sets of communities and is used to 
assess the success of each experiment. 

The foundation for communal overlap is the Jaccard in- 
dex, J(A, B) = \A H B\ I \A |J B\, which measures similarity 
between sets, say A and B. Each community, C; e C, is a 
set of nodes. Thus, the Jaccard index provides a means for 
comparing two different community configurations. Let C k (n) 
and C d {n) represent the known and detected communities cor- 
responding to node n. Then, if A — C k (n) and B = C d (n) 
the Jaccard index measures their similarity. In this context, 
greater similarity implies better detection. Communal over- 
lap, shown in equation [9] computes the weighted average of 
the Jaccard indices for all nodes in the network. The higher 
the value of communal overlap, Q. e (0,1], the greater the 
similarity between the communal configurations C k and C d , 
where Q = 1 represents a perfect match. Q = is unattain- 
able because at the very least, for all n, C k (n) and C d (ri) share 
the node n and thus |C*(n) f| C d (n)\ > 0. 



N ^ 



\C k (n)r\C d (n)\ 
N ^\C k {n)\JC d {n)\ 



(9) 



II. EXPERIMENTAL RESULTS 

In this section, we report on three experiments. The first 
two study stochastically generated networks with a prescribed 



community structure. The third experiment involves a real 
world network, the 2005 National Football League (NFL) 
schedule. In each case, we measure the ability of the DAMM- 
enabled EO algorithm to recover known community structure. 



A. Experiment I: independent contributions of positive and 
negative edges 

1. Generating networks stochastically 

In our tests, we generated networks with N - 6A nodes and 
|C| = 4 communities of equal size (16). Once the communities 
are established, both positive and negative edges are added to 
the network. By default, positive edges are added between 
nodes of the same community and negative edges are added 
between nodes of different communities. An exception to this 
rule involves false positives and false negatives, discussed in 
section HUT] 

The stochastic generation algorithm uses two parameters: 
the mean number of intra-community edges per node Zm (both 
nodes in the same community), and the mean number of inter- 
community edges per node z out (nodes in different communi- 
ties). Intra-community edges are assigned an edge weight of 
eij = 1, and inter-community edges are negatively weighted 
(eij = -l). 

For any node n e N there are \N\ - 1 possible edges, 
discounting self-loops. To generate the positively weighted 
edges, a pseudo-random number r m e [0, 1) is generated for 
each potential intra-community edge. If r,„ < p m the edge 
is added, where /?,„ is the probability of an intra-community 
edge existing: p,„ = z in j (|C,| - 1). We follow a similar proce- 
dure for negative edges. Each potential inter-community edge 
is generated with probability p out = z ou tl QN\ - |C,|). 



2. Experimental setup 

We refer to the mean cumulative degree of a node, which 
combines both the intra-community and inter-community de- 
grees, as Zcum - Zm + z ou t- For the first experiment, we gen- 
erated a series of networks with z CU m = 16. While the value 
of Zcum was held static, Zm 6 [0, 16] and Zout e [0, 16] were 
dynamically adjusted and relate inversely such that z ut - 
16 - Zm- For each (zin,Zout) parameter setting, 100 indepen- 
dent networks were generated. We refer to these networks as 
being dual assortative (DA). Our goal is to compare the in- 
dependent contributions of the positive and negative edges of 
the DA networks. Towards this end, from each DA network, 
we extracted the embedded positive assortative (PA) and neg- 
ative assortative (NA) networks. To extract the PA network, 
all negative edges were removed from the original DA net- 
work. In contrast, to establish the NA network, all positive 
edges were removed from the DA network. For each network 
- whether it be a DA, PA, or NA network - the DAMM was 
optimized using the EO algorithm and the community overlap 
Q. was assessed. 
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Any value of z,„ > |C, | yields full intra-component connec- 
tivity. Thus, for Zi„ = 15 and z,„ = 16, the intra-component 
sub-networks are fully connected. On the other hand, the max- 
imum value of Zout — 16 covers merely one-third of the poten- 
tial inter-component edge space. 



3. Results 

Figure [2] shows the results of the first experiment. On the 
lower x-axis, the positive degree, z,„, is displayed. On the top 
x-axis, the negative degree, Zout, is shown. The y-axis rep- 

\C\ 

resents the mean communal overlap, (Q) = 1/|G| Jj-J £2(G,), 
for the set of generated networks G corresponding to the spec- 
ified (Zm,z ou t) setting. The solid curve shows results for the 
DA networks. For all (Zi„,Z 0U t) settings, (Q^a) > 0.95. Op- 
timization of the DAMM on the DA networks detects the 
known communities with high fidelity. The dashed curve 
shows DAMM-optimized PA networks. For z,„ > 4, the PA 
networks yield a (Qpa) > 0.95, which is comparable to the 
DA networks. However, for z- m < 4, the community overlap 
values for the PA networks are significantly less than those 
observed for the DA networks. This deficiency highlights the 
importance of the negative edges that are removed to create 
the PA networks. By removing these edges, information used 
by the DAMM is lost. As a result, the detection process suf- 
fers. The dotted curve presents the results for the NA net- 
works. Note that for only z ou t = 16 does (Qna) > 0.95. 
For Zout ^ 10, (Qna) < 0.5, which means that, on average, 
there exists less than a 50% overlap between the detected and 
known communities. 

The distance between the NA (dotted) and DA (solid) 
curves of figure [2] highlights the importance of the positive 
edges that were removed from the original DA networks. The 
mean distance between the DA (solid) and PA (dashed) curves 
is 0.084 units of community overlap. By comparison, the 
mean distance between the DA and NA curves is 0.50 units 
of community overlap. Removal of the positive edges from 
the DA networks for the investigated parameter range has a 
significantly greater deleterious impact on community detec- 
tion. 

These results provide a proof-of -principle for the DAMM. 
Regardless of the (z,„, z m) setting, optimization of the DAMM 
yields a high communal overlap ((SIda) > 0.95) on the DA 
networks. When either the positive or negative edges are re- 
moved, the detection process suffers. Note that if we sim- 
ply optimize the original modularity measure on the DA net- 
works, the contributions of the negative edges are ignored. 
By optimizing the DAMM on the PA networks, we achieve 
the equivalent - since the negative edges have been removed, 
the negative information is unavailable to the DAMM. With- 
out the negative edges, the communal overlap yield drops. 
By incorporating the contributions of both positive and neg- 
ative edges, the DAMM outperforms the original modular- 
ity measure. Furthermore, we have demonstrated high fi- 
delity community detection using only the negative edges. 
At (zin - 0,z o ,„ - 16), optimization of the DAMM yields 
<£W > 0.95. 
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FIG. 2: Community overlap comparison for dual assortative (solid), 
positive assortative (dashed), and negative assortative (dotted) net- 
works. Each data point represents the mean communal overlap for 
optimization of the DAMM on 100 independent, computer generated 
networks. 



B. Experiment II: the impact of false positives and false 
negatives 

The second experiment introduces "false" edges to the net- 
works: false positives and false negatives. A false positive 
is a positively weighted edge that connects nodes in different 
communities. In the language of friends and adversaries, a 
false positive indicates a friendship between people of differ- 
ent communities. A false negative occurs when two nodes of 
the same community are connected by a negatively weighted 
edge. This occurs when there is an adversarial relationship 
between two people of the same community. False positives 
and false negatives are routinely found in real-world networks. 
Their presence contributes to the challenge of detecting com- 
munities. 

To assess the impact of the false positives and false neg- 
atives, we generated DA networks with a fixed (zi„,z ou t) set- 
ting, and then exclusively added either false positives or false 
negatives. For this experiment, we did not disassemble the 
DA networks into their PA and NA constituents. The commu- 
nity detection algorithm, which optimizes the DAMM, was 
applied to each DA network and the communal overlap £2 was 
computed. 



1. Generating false positives and false negatives 

To include false positives and false negatives, we introduce 
two additional parameters: / + , the mean number of false posi- 
tives per node, and f~, the mean number of false negatives per 
node. To generate false positives, we assess the unused neg- 
ative edge space following the initial edge generation phase 
(section lll A 11 1. Assume that E~ represents the entire negative 
space considered for a given node in the initial phase. We re- 
fer to the unused subset of this space as U~ e E~ and establish 
the probability + = / + /U~. For each potential edge e, y € U~, 



6 



we generate a random number r + e [0, 1). If r + < + , a pos- 
itive edge between is added such that en = 1, where i and j 
are known to reside in different communities. The generation 
of false negatives follows a similar procedure; however, f~ 
dictates the likelihood of adding negatively weighted edges 
between nodes residing in the same community. 



2. Experimental setup 

We generated base networks with three different settings: 
(Zin = 5,z„ ut = 16), (zin = 7 ' , z ou t = 16), and (z in = 
5,z ut = 22). The first parameter pair was chosen be- 
cause the degree represents one-third connectivity within both 
the intra-community and inter-community subspaces. For a 
given node, since |C,| = 16, the maximum number of intra- 
component edges is z,„ = 15 and the maximum number of 
inter-component edges is z out = 48. Figure [2] shows that 
Zin = 5 represents the relative threshold for which QpA > 0.95 
and Zout = 16 for Q NA > 0.95. The other two parameter 
settings were chosen to analyze the effect of independently 
increasing intra-community or inter-community connectivity. 
The second pair of parameters, (z,„ = l,z ou t = 16), was cho- 
sen to highlight the effect of increasing Zm when z ou t is main- 
tained. The increase from z,-„ = 5 to Zm = 7 represents a 
13% increment in intra-community coverage. Analogously, 
the third parameter pair, (z,„ = 5,z ou r - 22), represents a 13% 
increase in the inter-community coverage and allows us to an- 
alyze the effect of increasing z ou t while holding z,„ steady. To 
these base networks, we independently added either false pos- 
itives or false negatives. Accordingly, the edge generation pa- 
rameter space is extended to (z; n , z 0Kf , f* , f~) with / + e [0, 8] 
and f~ e [0, 8]. None of the networks contain both false pos- 
itives and false negatives: the addition of the false edges is 
mutually exclusive to a single type. Thus, if / + > 0, then 
f~ = 0; conversely, if f~ > 0, then / + = 0. For each parame- 
ter setting, forty networks were created, each with a different 
random number generator seed. 



f~ < 4, the mean communal overlap (D.) > 0.95. Thus, the 
range of high fidelity detection has been extended. Compari- 
son to the top graph highlights another effect of the additional 
information: the mean distance between the false positives 
curve and the false negatives curve, denoted as (5), has in- 
creased. With reference to table U (<5 z ,„=5, z „„,=i6) = -058 as 
compared to ((5,. =7 , in=16 ) = .113. Further, the increase in z;„ 
improves the mean communal overlap for the range of both 
curves from M Zjn= s,z„„,=\6 - -67 to M Zin= i ^=16 = -82. 

The bottom graph presents results for (z,„ = 5,z OM , = 
22, / + ,/~), for which, in comparison to the top graph, z ou t 
is increased and z,„ is unchanged. Similar to the effect ob- 
served in the middle graph, M increases in comparison to the 
initial parameter setting (M Zln =s tZout =22 - -79 as compared to 
M zu= $ ; 7 „,=i6 = -67). However, unlike the case for increasing 
Zin (top graph), the mean distance between the curves, (6), 
does not differ significantly ((<5-,-„=5 r„„=22 = -069) compared to 
«V . u , _ .()58». 

The second experiment establishes that the independent im- 
pact of false positives and false negatives is influenced by the 
composition of the network. An increase in either z/« or Zout 
lessens the detrimental effect of either false positives or false 
negatives, as demonstrated by the M values of table |U Fur- 
ther, the additional edges (relating to z,„ and z ut)> appear to 
have an asymmetric effect on the impact of false positives and 
false negatives. The increase in z,„ significantly widens the 
gap between the false positives curve and the false negatives 
curve (((5 ; ,„=7 ,z„„,=i6) = -113). The increase in z ut nas a much 
less pronounced effect on the distance between the false posi- 
tives and false negatives curves ((<5. n=7 Zm(=16 )=.069). 



TABLE I: Results for false positives and negatives. 





Zout 


<n + > 


(Q-) 


(<5> 


M- ( a+ * a ') 


5 


16 


.70 


.64 


.058 


.67 


7 


16 


.88 


.77 


.113 


.82 


5 


22 


.83 


.76 


.069 


.79 



3. Results 

Figure [3] shows the results on networks with false posi- 
tives and false negatives. In the top graph, corresponding to 
(Zm = 5, Zout = 16,/ + ,/~), we see that both curves are sim- 
ilar, although the detrimental effect of the false negatives is 
slightly greater. As expected, as the rate of either false posi- 
tives or false negatives increases, (Q) decreases. When only 
a couple of false edges are added, the known communities 
are detected without a significant drop-off. For f + <3 and 
f~ < 3, the mean communal overlap exceeds (Q) = 0.95. 
However, beyond this range, the detection rate suffers. With 
reference to table [I] we see that the mean communal overlap 
for both curves, M = <(Q + + CT) /2), is 0.67. 

In the middle graph, corresponding to (z,„ = 7,z „, = 
16,/ + ,/~), and with the mean degree of intra-community 
edges increased from z,„ = 5 to z,„ = 7, the effect of the addi- 
tional positive edges is observable. Note that for f + <6 and 



C. Experiment III: 2005 National Football League schedule 

The third experiment uses a real dataset: the 2005 Na- 
tional Football League (NFL) schedule. From the dataset, 
we construct networks representing the correlation of team 
schedules. Each team is represented by a node. Edges be- 
tween nodes are weighted to indicate the correlation of the two 
team's schedules. Teams that play similar opponents show 
positive correlations. Teams that play dissimilar schedules are 
negatively correlated. The network contains both positively 
and negatively weighted edges and is thus dual assortative. 
Because it is possible to compute the correlation between any 
two team schedules, the network is fully connected. However, 
the objective of the experiment is to examine the efficacy of 
optimizing the DAMM on partial representations of the dual 
assortative network. Accordingly, only a subset of the possi- 
ble edges are represented in any given generated network. 
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FIG. 3: The effect of false positives (solid) and false negatives 
(dashed) on communal overlap. The top graph represents base net- 
works with (z,„ = 5,Zout - 16); the middle graph for networks 
with (zin = 7,z,,ut = 16); and the bottom graph for networks with 
(Zin = 5, tout - 22). For each graph, the y-axis represents the mean 
communal overlap value for forty networks. The x-axis represents 
the mean number of either false positives or false negatives added to 
the networks. 

The NFL consists of thirty-two teams split into two con- 
ferences. Within each conference, teams are grouped into 
four divisions of four teams apiece. Each team plays sixteen 
games. Six of these games are the result of a team playing 
its three division rivals twice each. In addition, each divi- 
sion is paired with one division of the same conference and 
a second division that resides in the other conference. For 
each team, these division-versus-division games account for 
eight additional games (bringing the running tally to fourteen 
games). The final two opponents for each team result from 
games against teams from the same conference, but not in- 
volved in the division-versus-division matchup. In total, each 
team faces thirteen unique opponents. 

Through extensive analysis using the EO algorithm, we 
identified four optimal and two near-optimal communal align- 
ments for the fully connected NFL schedule correlation net- 
work. We refer to these communal alignments as the known 
optimal configurations. The four optimal alignments each 
consist of three communities (one community consisting of 
8 teams and the other two communities containing 12 teams 
apiece). In each case, the 8 team community is comprised 
of two divisions from the same conference that are pit- 
ted in a division-versus-division matchup. Each of the 12 
team communities contain three divisions, with one divi- 
sion being involved in an intra-conference division-versus- 
division matchup with one of the other divisions and an inter- 
conference division-versus-division matchup with the remain- 
ing division. Both of the near-optimal communal alignments 
consist of four communities. In one case, each community 
contains two divisions pitted in an inter-conference division- 
versus-division matchup. In the other, each community con- 
sists of two divisions pitted in an intra-conference division- 
versus-division matchup. 

With regards to the four optimal communal alignments, the 



NFL schedule correlation network contains both false pos- 
itives and false negatives. In each alignment, there exist 
teams sharing positively weighted edges that belong to differ- 
ent communities. These edges constitute the false positives. 
Further, in each alignment, there exist teams within the same 
community that share negatively weighted edges. These edges 
constitute false negatives. 



1. Generation of networks 

To study the performance of the DAMM on the NFL net- 
work, we first optimized it on various partial representations 
of the NFL schedule correlation network. We then compared 
the detected communal alignment to the set of known opti- 
mal configurations and identified the closest match. The best 
communal overlap score from this series of comparisons was 
recorded as the communal overlap value. 

The fully connected NFL schedule correlation network con- 
tains 992 edges (discounting self-loops). Of these edges, 352 
(35 percent) are positively weighted and 640 are negatively 
weighted. We generated the partial representation using a 
procedure similar to the edge generation algorithm used to 
generate the partial representations described in section lll A II 
Instead of computing probability thresholds (such as /?,„ and 
Pout) from a mean degree (such as z,„ and z ou t), we simply used 
the probability thresholds as parameters. Each possible pos- 
itively weighted edge of the full representation was selected 
with probability p + and each negative edge was selected with 
probability p~ = 1 - p + . 

The stochasticity of this process guarantees that with high 
probability individual nodes of a partial representation will 
have varying degree. Because of this asymmetry, certain 
nodes are more difficult to classify than others. These asym- 
metries could yield situations where the optimal experimental 
DAMM configuration will not concur with the known opti- 
mal configurations. In such a case, a sub-optimal communal 
overlap will result. 

Our goal is to examine whether, on average, the DA in- 
formation utilized by the DAMM leads to better community 
detection relative to either the PA or NA information in iso- 
lation. Recall that we create the PA network by removing all 
negative edges from the corresponding DA network; whereas, 
for the NA network, we remove all positive edges from the 
DA network. 



2. Experimental setup 

We explored the parameter range p + e [0, 1] and p~ e 
[0,1]. All of the studied networks were partial representations 
of the fully connected network. For each (p + , p~) setting, we 
generated 40 networks. Similar to the experiment of section 
III Al in each case, we separately optimized the DAMM on the 
DA network, the PA network, and the NA network. 
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3. Results 

Figure |4] gives the results of the third experiment. The 
bottom x-axis represents the p + values; the top x-axis repre- 
sents the p~ values. The y-axis represents the mean commu- 
nal overlap, (£1), for the corresponding (p + ,p~) thresholds. 
Each data point represents the mean communal overlap re- 
sulting from optimization of the DAMM on 40 independent, 
randomly generated partial networks with the same prescribed 
thresholds. 

For each (p + , p~) setting, optimization on the DA networks 
yields an equal or higher {Q) value than for either the PA or 
NA networks. Only at (p + = l,p~ = 0) and (p + = 0,p~ = 1) 
do (Qpa) = (Qda) and (Qna) = (Qda), respectively. At these 
settings there are either exclusively positive or exclusively 
negative edges, and thus, these DA networks are equivalent 
to the respective PA or NA cases. For all parameter settings at 
which there are both positive and negative edges, the DAMM 
uses both types of information and achieves higher mean com- 
munal overlap values. Despite the presence of 1.8 times more 
negative edges than positive edges, the positive edges provide 
more information for community detection. Using only neg- 
ative edges, at (p + — 0,p- — 1), optimization of the DAMM 
detects a sub-optimal communal alignment. On the other 
hand, using only positive edges (at (p + = l,p- = 0)) op- 
timization yields an optimal mean communal overlap value. 
Figure [4] highlights the asymmetry regarding the amount of 
information provided by the positive edges as compared to 
the negative edges. The mean distance between the DA and 
PA curves is 0.20 communal overlap units; whereas, the mean 
distance between the DA and NA curves is 0.34 communal 
overlap units. The positive edges contribute more to the com- 
munity detection process. As expected, as p + increases (Qpa) 
increases. Similarly, as p~ increases, (Q/va) increases. 



0.9 08 0.7 0.6 0.5 0.4 0.3 0.2 0.1 
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FIG. 4: Community overlap measures for the 2005 NFL sched- 
ule network. The bottom x-axis represents p* and the top x-axis 
represents p~. The y-axis represents the mean communal overlap, 
(Q). The different curves present information regarding the differ- 
ent types of networks upon which the DAMM is optimized: DA 
networks (solid), PA networks (dashed), and NA networks (dotted). 
Optimization of the DAMM on the DA networks yields as good or 
better mean communal overlap values than for either the PA or NA 
networks. By utilizing both the positively and negatively weighted 
edges, optimization of the DAMM provides better community de- 
tection than the original modularity measure that operates only on 
positively weighted edges. 

ule correlation network of section lITCl provides a real- world 
example. The DAMM expands the domain of problems for 
which community detection algorithms can be applied. 



APPENDIX A: DERIVATION OF AQ' 



III. SUMMARY AND CONCLUSIONS 

The DAMM provides a way to assess community struc- 
ture in networks containing both positively and negatively 
weighted edges. This extends the paradigm of the friendship 
network to that of a friends and adversaries network. Neg- 
ative information, previously ignored, now provides useful, 
additional information to community detection algorithms. 

The efficacy of the DAMM was demonstrated, both for 
stochastically generated synthetic networks and a real-world 
example based on the 2005 NFL schedule. Furthermore, the 
experiments revealed the asymmetry in the information pro- 
vided by positive and negative edges. This asymmetry is due 
to the greater specificity provided by a positive edge given that 
more than two communities exist. 

The contributions of the DAMM are two-fold. First, we 
can now analyze networks containing solely negative informa- 
tion. Second, the DAMM improves community detection in 
networks containing both positive and negative information. 
An example of such a network is one in which edge weights 
are based on a similarity metric, such as correlation, that can 
assume either positive or negative values. The NFL sched- 



As expressed in equation|5] the DAMM, Q D , involves sum- 
ming the independent contributions of all communities i. We 
can represent the contribution a given community i with a lo- 
cal DAMM value, denoted as qi, such that Q D = qi- Fur- 
ther, if we wish to independently assess the positive and neg- 
ative edge weight contributions of each given community, we 
can write Q D = q + + qT. 

Computing AQ D,U entails comparing the DAMM value 
from timestep t to the DAMM value at time t + 1 that results 
from migrating node u. We denote the DAMM value at time t 
as <2f and the DAMM value following the migration of node 
u as 2^i(„) Accordingly, AQ?' U can be expressed as: 

Agf'" = Q° m -Q? (Al) 

Utilizing the local DAMM notation, we can rewrite equa- 
tion [AT| as: 

AQ?'" = (E^ + K«) + C + i(«))-(Z^ + ^) < A2 > 

i i 
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, where qt represents the positive edge contribution of the 
local DAMM value for community i at time t (before migrat- 
ing the node u) and <7, + (+1( „) represents the positive edge con- 
tribution of the same community following the migration of 
node u. The subscript t + \ (u) is used to indicate that a node u 
has been migrated. 

By grouping the positive local DAMM values and the neg- 
ative local DAMM values separately, we can write: 

Afif'" = Yjiqtt+m ~ lit) + Yjalt+m - lit) (A3) 

i i 

= aqP u + aq;' u (A4) 

By moving a single vertex from one community to another, 
as is done during the division process, only two communi- 
ties are affected. All other communities remain unchanged. 
One community gains a new node. We refer to this commu- 
nity as the gain community, and denote it as C g . Conversely, 
the other affected community loses a node. We denote this 
loss community as C/. Since only the C„ and C; communities 
are affected by a vertex move, the only local DAMM values 
that change are those relating to these communities - q g and 
qi. The local DAMM contributions of all other communities, 
{qi\i + g,i + /}, remain unchanged. Thus, we need only to as- 
sess the change in DAMM for the gain and loss communities. 
Using this information, we rewrite AQp" as: 



AQp" = 



2j Qi,t+ 1(») Qi, 



= A^f + A^;" 



(A5) 

(A6) 
(A7) 



, where q^ t denotes the local DAMM value for the positive 
edges associated with the gain community at time t, q^ t de- 
notes the local DAMM value for the positive edges of the loss 
community at time f, and <7g f+1(M) represents the local DAMM 
value for the positive edge contributions of the gain commu- 
nity following the migration of node u. 

We can now separately analyze the contributions of Aq^'" 
and Aqif and reassemble the terms to establish AQ~}~' U . We 
denote the node to be moved as u and introduce the follow- 
ing notation: {w gg = Z rj e rs |r e C g ,s e C g ] to represent 
the cumulative intra-community positive edge weight for the 
gain community, {w gu = 2„e, s |r e C g ,s - u) to represent 
the cumulative edge weight of the gain community connected 
to node m, and d u to represent the positive degree of node u. 
Note that q^ r which represents the contribution of the posi- 
tive edges in the gain community prior to moving the node, is 
defined as: 



2w„ 



(A8) 



We use the notation q + , , , , . to denote the contribution 
of the gain community positive edges after migrating node 



u. Following the migration, the gain community contains 
an additional node. Accordingly, both the cumulative intra- 
community positive edge weight, w gg , and the total cumula- 
tive positive edge weight of the gain community, a g , are sub- 
ject to change. More specifically, the intra-community posi- 
tive edge weight is updated where 
w gUl t represents the cumulative positive edge weight connect- 
ing the node u to the gain community prior to the migration. 
Furthermore, the cumulative positive edge weight of the gain 
community is updated as a g , f+ i( M ) = a gt + d u . Using these 
updates, we define q gl+Ull) as: 



q g ,i+m 



2(w gg + w gu ) I a g + d u 



(A9) 



By subtracting equation [A8 I from equation lA9l we establish 
Aq + '" as provided in equation [ATT] 



Aqt- 



q 



2w 



J2 



2 



(A10) 
(All) 



Similarly, the positive edge contribution of the loss group, 
denoted as Aqi'", is defined by equation lA14l 



A +>f + + 



2(w u - wi u ) (a t - d u 



T \ T 

2w u _ /f/\ 21 
T \Tl 
-2wu, 2d u I _du_ 
T + T 1 V 1 2 



(A12) 

(A13) 
(A14) 



By assembling equations IA1 ll and lA14l we establish AQi'" 
as provided by equation lA16l 



A A +,« , A +.« 

Ag,' = Aq gt +Aq,' t 



( Wgu-Wlu \ d u / 



(3/ — Q,p 



d u )) 



(A15) 
(A16) 



Following a similar logic, we establish AQJ'". For brevity, 
we provide the result in equation IA18I where w gu represents 
the cumulative negative edge weight of the gain community 
connected to node u and d u represents the negative degree of 
the node u. 



AQi" = Aq-J + Aq'f 



= 2 



Wi u - W gu 



+ 71 { a ' ~ a s ~ du)) 



(All) 



(A18) 



T I T 2 

Finally, we establish AQ®'" by assembling equations IA16I 
id lAT8] 
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Ag?-" = 



AQp" + AQJ' U (A19) 
w gu -wi u \ d u 



2 
+2 



T 



d u 



+ Y2 ( a g - a i + d u )) 



(A20) 



ai,t+\(m) = ai,t - Dd,„ and a s ,,+i (m ) = a gJ + Dd„„ where d m rep- 
resents the positive degree of the m node. Accordingly, we 



can write B" t+X{m) as: 



B "+\(m) = ( a i ~ Dd >") ~ (a g + Dd m ) - d u (B5) 



APPENDIX B: DERIVATION OF A 2 Q L 



By subtracting B" from B" ,, „ we establish AB" as seen in 
equation^ 



As a first step of our derivation, we concentrate solely on 
the contributions of the positive edges. First, we analyze the 
difference A 2 Q^ um = AQ^-AQp", where Afi^ repre- 
sents the change in DAMM that would result from the migra- 
tion of node u if node m were already to have been migrated 
given the current configuration. AQp", which pertains exclu- 
sively to positive edges, was provided in equation IA16I To 
simplify the derivation, we independently assess the first and 
second terms of equation IA16I such that A" = **"'*''" and 
B'l = (a t -a g - d u ). 

After migrating node m, the value of A may be altered - 
and, this change should be reflected in A" ,, .. Note the two 

' & /+l(m) 

terms involved in A": w gu and wi u . Each term measures the 
cumulative positive edge weight connecting a community - 
either the gain or loss community - to the node u prior to the 
migration of node m. The migration of node m may or may 
not alter the cumulative positive edge weight between each 
community and node u. If node m was migrated to the com- 
munity currently not occupied by u, the gain community, w gu 
will increase such that w gu ,t+\ = w gu ,t + <?„,„, where e mu repre- 
sents the positive edge weight between the m and u nodes. 
Otherwise, if the m node moved in the opposite direction, 
Wg U ,t+i(m) = w gu ,t - <?„,„. Using the direction indicator D, we 
can write w gUit +i( m ) - Wgu,t + De mu . Similarly, we can write 
Whi,t+i(m) = wiu,t - De mu . By assembling the two terms, we 
express A;' +1(m) as: 



= B? +m -B» 



(B6) 



= [(a/ - Dd m ) - (a g + Dd m ) - d u ] 

-(a/ - a g - d u ) (B7) 
= -2Dd m (B8) 

We now utilize AA" and AB" to establish the positive edge 
contribution to A 2 Q D - um : 



A 2g+,« m 



2 
2 

AD 



AA u t + 


-^AB" 


2 Dp 


d u , 
+ ^(- 


T 




d u d m 




T 2 



(B9) 
(BIO) 
(Bll) 



Following a similar approach, it is possible to establish the 
negative edge contribution to A 2 Q D,um : 



A 2 Q-' 



AD 



T 2 ~T 



(B12) 



A" 



(Wg U + De mu ) - (w/„ - Demu) 



(in) 



(Bl) 



By subtracting A" from A" +1(m) , we establish AA" as shown 
in equation J" 



AAU - A u _ AH 

^ _ A l+l{m) A t 

(w gu + Demu) - (wi u - Demu) 



(B2) 



By assembling A 2 Qp um of equation MB and A 2 0y" m of 
equation lB12l we establish the generalized equation: 



A 2 Q D,um _ A ^Q^<>" +A tQ- 

= AD 



T 2 



(B13) 
(B14) 



(Wgu ~ Wiu) 
j (oi) 

= = (B4) 

The first two terms of B, a\ and a g , are similarly affected 
by the migration of node m. These terms represent the cu- 
mulative positive edge weight of the loss and gain commu- 
nities, respectively. The updated terms can be expressed as 
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